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Abstract 

A quantum random number generator (QRNG) can offer means to generate information- 
theoretically provable random numbers in principle. Unfortunately, in practice, the quantum 
randomness is inevitably mixed with classical noises. To distill the quantum randomness, one 
needs to quantify the randomness of the source and then apply a randomness extractor. Here, we 
propose a framework for evaluating quantum randomness of a physical device by min-entropy. In 
our post-processing, we implement two information-theoretically provable extractors - Toeplitz- 
hashing extractor and Trevisan's extractor - that yield a speed of 441 kb/s and 0.7 kb/s, respec- 
tively. The outcomes from both extractors pass all the standard randomness tests we exploited. It 
is the first time that such extractors are proposed and implemented in QRNG. 
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I. INTRODUCTION 



Random numbers play a crucial role in many fields of science, technology and industry, for 
instance, cryptography, statistics, scientific simulations and lottery [l-3]. Pseudo-random 
number generators (pseudo-RNGs) based on computational complexities have been well 
developed in the past few decades J4] and can generate high-speed random numbers with 
an extremely low cost. However, the main drawback of pseudo-RNGs is that the generated 
randomness is not information-theoretically provable. In fact, all of the (software-based) 
pseudo-RNGs can be realized by a deterministic algorithm given sufficient computational 
power. This pseudo-randomness would cause problems in many applications, such as those in 



40] ; security 



cryptography . Recently, Microsoft confirms that XP contains RNG bugs 
flaws have been found in online encryption methods due to imperfections of random number 
generation [ill ]. 

To address the security issue introduced by pseudo-RNGs, physical RNGs have been de- 
veloped 5K7J. Particularly, the probabilistic characteristic of quantum mechanics offers a 
natural way to build an information-theoretically provable RNG [5], i.e. quantum random 
number generator (QRNG). We remark that physical RNGs have been included in micro- 



processors 



42j, although the generated randomness is not quantum mechanical in nature 



43]. 



In theory, a QRNG can produce random numbers with provable randomness. In practice, 
on the other hand, the quantum signals (the source of true randomness) are inevitably mixed 
with classical noises. An adversary (Eve) can, in principle, control the classical noise and 
gain partial information about the raw random numbers. Therefore, it is necessary to 
apply a post-processing procedure to distill out the true randomness that Eve has almost 
no information about. This distilling procedure is called randomness extraction, realized 
by employing randomness extractors. That is, randomness extractors are used for distilling 
the true randomness and eliminating the effect of classical noises. The goal of randomness 
extractors js, 9] is to extract (almost) perfect randomness from the raw data generated from 
a practical QRNG with the help of a short random seed. Such a seed requires a second 
source of randomness. A key input parameter of a randomness extractor is the min-entropy 
(see Definition II. lj) of raw data. 

Previously, some simple post-processing methods have been widely used for QRNGs. For 
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example, an exclusive-OR (XOR) operation has been employed in the literatures 10|, 111] : 
dividing the raw data into two bit strings and performing a bitwise XOR operation between 
;hem. In addition, a least-significant-bits operation {(3, 7] or non-Universal hashing functions 



121 ] have been proposed and implemented in QRNG. The above operations can certainly 



refine the raw data to pass some randomness statistical tests. However, the key point is, the 
generated randomness is not information-theoretically provable. Recently, a more sophisti- 
cated randomness extraction procedure is proposed in [13J , which quantifies the randomness 
by Shannon entropy instead of min-entropy and applies non-universal hash functions for ex- 
traction. Unfortunately, the randomness extracted there is still not information-theoretically 



provable due 
non entropy 



;o the following two reasons: randomness cannot be well quantified by Shan- 



151 ] and the randomness from non-Universal hashing functions relies on 



computational assumptions 



44] 



In this paper, we present a framework to quantify the randomness of the quantum signals 
from QRNG by min-entropy and discuss how one can evaluate the min-entropy by developing 
a model for the physical device. We propose and implement randomness extractors to extract 
out the true randomness. Based on a few reasonable assumptions (see Section III Bl) on the 
physical model of QRNG, the randomness extracted in the post-processing is information- 
theoretically provable. The main objective of this work is to develop a scheme that can 
process the raw data from a QRNG to random numbers that (nearly) follow a uniform 
distribution. Our post-processing scheme consists of three steps: 

1. model and characterize the QRNG setup by performing necessary measurements (see 
Section III AT) ; 



2. quantify the quantum randomness of the raw data with min-entropy (see Section Hi Bp ; 



3. apply a randomness extractor (see Section HlI Al and IIII Bp . 



Randomness extractors can also be used for privacy amplification 16[ in quantum key dis- 
tribution (QKD). Note that privacy amplification is a crucial step in QKD post-processing. 
A few randomness extractors have been proven to be secure against quantum side channels 
17j . The main advantage is that no (or little) classical communication is required for pri- 
vacy amplification. It is an interesting prospective research topic for applying the techniques 
developed in randomness extraction to privacy amplification. 
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The contents are organized as follows. We introduce related notations and definitions in 
the rest of this section. In Section [XXJ, we present a procedure to evaluate the min-entropy 
of the quantum signals. We implement a universal hashing (Toeplitz-hashing) extractor in 
Section IIII Al and Trevisan's extractor in Section IIIIBl In Section IIV[ we show the results of 
statistic tests. We finally conclude this paper in Section [V] 



A. Notations and definitions 

Notations: Ud represents a uniform distribution on {0, l} d ; log denotes the logarithm to 
base 2. The outcome of an ideal RNG, described by a random variable, follows a uniform 
distribution. 







Min-entropy is widely used for quantifying the randomness of a probability distribution 



Definition 1.1. (min-entropy) The min-entropy of a probability distribution X on {0,1}™ 
is defined by 

H^X) = - log ( max Pr[X = v}) . (1) 
\«e{o,i}" J 

In cryptography, the deviation of a practical protocol from an ideal protocol is charac- 
terized by a security parameter, e. The statistical distance is commonly used as a standard 
security measure. 

Definition 1.2. (e-close) Two probability distributions X and Y over the same domain T 
are e-close if the statistical distance between them is bounded by e, 

\\X - Y\\ = m&x\Pr\X = v] - Pr\Y = v}\ 



1 {^) 
= - ^2\Pr[X = v] -Pr[Y = v] \ <e. 

Roughly speaking, when X is e-close to Y, X is indistinguishable from Y except for a 
small probability, e. For example, the output of a practical RNG is said to be e-close to 
an ideal RNG if it satisfies Definition |2j We emphasize that the security parameters from 
Definition [2] are composable. The notion of composability was first proposed in the classical 
cryptography for the st udy of security when composing classical cryptographic protocols in 



a complex manner (la, Il9j . It is introduced to quantum cryptography by [20|, 121 ]. 
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Definition 1.3. (Extractor) A (k,e,n,d,m)- extractor is a function 

Ext : {0,l} n x {0,l} d ^ {0,l} m , (3) 

such that for every probability distribution X on {0, l} n with H o0 (X) > k, the probability 
distribution Ext(X, Uj) is e-close to the uniform distribution on {0, l} m . 

In short, an extractor is a function that takes a small seed of d bits and a partially random 
source of n bits to output an almost perfect random bit string of m bits. 

Definition 1.4. (Strong extractor) A (k,e,n,d,m)- strong extractor Ext(X,Ud) is an extrac- 
tor such that the probability distribution Ext(X, Ud)°Ud is e-close to the uniform distribution 
on {0, l} m + d . 

Note that the key advantage of a strong extractor is that the input (random) seed can 
be reused (with a security parameter increased by e). Thus, one can partition the output of 
a practical RNG into (small) blocks and process them by a strong extractor with the same 
seed. 

Definition 1.5. (Universal hashing) A family of hash functions H, mapping S to T , is 
two-universal if 

Pr hen {h(x) = %)} < ±- (4) 

for all x 7^ y E S . 



II. QUANTUM RANDOMNESS EVALUATION 



In this section, we provide a general framework to evaluate the quantum (true) random- 



ness in a practical QRNG. The QRNG developed in Ref. |11|.|22| is discussed as an illustration 
of the evaluation process. We remark that the evaluation procedure can be applied to other 
QRNGs with certain modifications. 



A. Physical model 

In general, the random numbers of a QRNG come from a certain quantum measurement. 
We refer the measurement outcome as quantum signal. This quantum signal is inevitably 
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mixed with classical noises, such as background detections and electronic noises. From a 
cryptographic view, these classical noises might be known to (or even manipulated by) Eve. 
Hence, the main objective of the post-processing for a QRNG is to extract out the quantum 
(true) randomness and eliminate the contributions of classical noises. 

Let us consider a generic flow chat of QRNG, as shown in Fig. [T] Firstly, a quantum state 
is prepared, which is the source of true randomness. Then, a measurement is performed on 
the quantum state. Finally, the raw data is post-processed by a randomness extractor to 
generate truly random numbers. For example, in |22j, the quantum state, which essentially 
characterizes the random phases of the photons from spontaneous emissions, is prepared by 
operating a laser near its threshold level. The measurement is operated by a delayed self- 
heterodyning system. The raw data is evaluated based on a physical model and processed 
by two randomness extractors. 



' * 
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FIG. 1: A generic schematic diagram of a QRNG setup. A quantum state is prepared and measured. 
Then the outcome is processed to generate random numbers by an extractor. 

B. Quantum randomness evaluation 

The key parameter we need to evaluate here is the min-entropy (defined in Eq. (JT])) of the 
quantum signal contained in the raw data. In the following, we present a method to evaluate 
the min-entropy by deriving the whole probability distribution of the quantum signal. 

Let us take the QRNG setup in [22j as an example. The details of the setup can be 
found in Appendix El The assumptions of the physical model needed for the derivation of 
the probability distribution of the quantum signal are listed as follows. 

1. Quantum signal is independent of classical noise. 

2. The quantum signal follows a Gaussian distribution. The analog signal is digitalized 
by an analog-to-digital convertor (ADC). 
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3. Quantum signal to classical noise ratio can be determined, denoted by 7. 

4. Total signal variance, the mixture of quantum signal and classical noise, can be char- 
acterized by sampling, denoted by u^ otal - 

Note that the last assumption can be satisfied when the sequence of the raw data is inde- 
pendent and identically distributed (iid). 

To derive the probability distribution of the quantum signal, the key point is to find out 
its variance. This is done by measuring the total variance of the raw data and the quantum- 
signal to classical- noise ratio. That is, with the assumptions 1, 3 and 4, one can easily derive 
the quantum variance, 



2 _ 1® total lr\ 

quantum i i _. ' v / 



T to, 

1 + 7 

From the variance together with the Gaussian distribution assumption, one can get the 
whole probability distribution of the quantum signal. 



Since the analog signal is sampled by an 8-bit ADC to generate digital bits |22j, one can 
evaluate the probability distribution of the digitalized output on {0, l} 8 , given the Gaussian 
distribution of the quantum signal. Then the min-entropy of the quantum si gna l can be 
derived by Definition 11.11 Following the detailed calculation procedures in Ref . [22J , a min- 
entropy of 6.7 bits per 8-bit raw sample (from an 8-bit ADC) is obtained. 

C. Upper bound of randomness 

The randomness of a given QRNG setup is a limited resource. This can be shown by 
providing the upper bound of randomness, say, via Shannon entropy, that one can extract 
from the measurement outcome 45]. The upper bound also indicates how much margin is 
left for further improvement in post-processing. Here, we give an example to show how one 
can evaluate the upper bound of entro py for a practical QRNG setup. 

Again, let us take the setup used in for example. The quantum signal is measured by 
a photo detector (PD). Given a perfect photon-number resolving detector, the upper bound 
of the min-entropy is determined by the photon number within the detection time window. 



The laser power used in the setup is 0.95 mW 



46], which corresponds to 1.5 x 10 6 photons 



at 1550 nm within 200 ps detection time window. Thus, the maximal entropy of a sample 
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from the PD can be estimated by log(1.5 x 10 6 ) = 20.5 bits, which is the upper bound of 
the min-entropy of the QRNG source. 



III. RANDOMNESS EXTRACTION 

In this section, we will present two different randomness extractors, universal hashing 
and Trevisan's extractor, to process the raw data from a QRNG. 

A. Universal hashing 

Owing to the similarity between the definitions of extractors and privacy amplification 



16] , any privacy amplification scheme can be used as an extractor in principle. However, 
there is one subtle difference. In privacy amplification, the random seed (public randomness) 
is assumed to be free, whereas in extractor, one needs to take the seed into account as it 
does consume random bits. Therefore, a direct transplant of privacy amplification schemes 
may not work for randomness extraction. In fact, for a popular universal hashing function, 



Toeplitz-hashing 



23 



24( , the random seed used to construct a Toeplitz matrix is longer than 
the output string. This means that no net randomness can be extracted if the universal 
hashing is directly used for randomness extraction. To overcome this problem, one needs to 
prove that the privacy amplification scheme constructs a strong extractor (see Definition lI.4p . 
thus allowing the re-use of the seed in subsequent applications. Fortunately, the extractors 



constructed by universal hashing functions 



25| (see. Definition [L5|) can be easily proven to 



be strong extractors by the Leftover Hash Lemma 



Lemma III.l. (Leftover Hash Lemma Let 7i = {hi, h 2 , . . . , h 2 d} be a (two-)universal 
hashing family, mapping from {0, l} n to {0, and X be a probability distribution on 
{0, l} n with H ao (X) > k. Then for x G X and h y E H where y G Ud, the probabil- 
ity distribution formed by h y (x) o y is e = 2^ m ^ k ^ 2 -close to U m+ d- That is, it forms a 
(k,2^ m ~ k ^ 2 ,n,d,m )-strong extractor. 

We remark that Lemma IIII.ll also implies that the Toeplitz matrix can be re-used in 
the privacy amplification of QKD. Then, one can use a private key (as a seed) to construct 
Toeplitz matrix for privacy amplification without compromising (much of) the privacy of the 
seed. Hence, reusing the seed can save the classical communication for privacy amplification, 
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which is normally required in a standard QKD post-processing [26|. It is also practically 
beneficial for privacy amplification to divide the raw key data into small blocks and apply 
a small Toeplitz matrix individually. However, the finite-size effect of a small block can 
significantly lower the pricavy amplification efficiency 
research topic for future study. 



This issue is an interesting 



Here, we use Toeplitz matrices for universal hashing function construction [23j, |24j and 
implement the Toeplitz-hashing extractor. A Toeplitz matrix of dimension n x m requires 
only the specification of the first row and the first column, and the other elements of the 
matrix is determined by descending diagonally down from left to right. Thus, the total 
number of random bits required to construct (choose) a Toeplitz matrix is n + m — 1. 

The procedure of Toeplitz-hashing extractor is given as follows. 

1. Given raw data of size n with the min-entropy of k and a security parameter e, deter- 
mine output length to be 

m = k — 2 loge. (6) 

2. Construct a Toeplitz matrix with an n + m — 1-random-bit seed. For demonstration 
purpose, we use peudo random numbers for this step. 

3. The extracted random bit string is obtained by multiplying the raw data with the 
Toeplitz matrix. 



We implement Toeplitz-hashing extractor the QRNG presented in [22]. As mentioned in 
Section HI1 the min-entropy of the raw data is bounded by 6.7 bits per 8-bit sample. With 
the input bit-string length of 2 12 = 4096, the output bit-string length is 4096 x 6.7/8 > 3430. 
Thus, we use a 4096-by-3230 Toeplitz matrix for randomness extraction, which results an 
e < 2~ 100 as from Eq. ([6]). Our implementation of Toeplitz-hashing extractor achieves 
generation rates of 441 kbits/s |47| . 

As a result, the extracted bit sequence passes all the tests of Diehard, NIST and TestUOl 
(Small Crush) statistical test suites (see Section HVl and Appendix ICl) . 

We notice that Toeplitz matrix hashin g is implemented for QKD privacy amplification 
with a block size exceeding 10 6 recently [28.]. As discussed above, privacy amplification 



requires a big block size due to the finite-size key effect [29], whereas in the application of 
randomness extraction, a small block size will only reduce the efficiency. Nevertheless, the 



9 



technique developed in 28] could be useful for extractor implementations as well, which we 
will leave for future investigation. 



B. Trevisan's extractor 



Trevisan proposed an approach to construct randomness extractors based on pseudo- 
random number generators [9|. Here, we implement its improved version by Raz, Reingold 
and Vadhan {30]. There are two main steps to construct a Trevisan's extractor: error correc- 
tion code and combinatorial design. The error correction code is constructed by concatenat- 
ing a Reed-Solomon code with a Hadamard code, as shown in Appendix A of Ref . 31] . For 



;he combinatorial design part, we implement a refined version of Nisan-Wigderson design 



32] . Note that with certain modifications on the security parameters, Trevisan's extractor 



can also been proven to be a strong extractor (see Theorem 22 in 3JJ). 

The detailed implementations of Trevisan's extractor are presented in Appendix [B] 



IV. RANDOMNESS TEST 



Statistic tests 



We apply three standard statistic tests, Diehard [48|, N 
evaluate our results. Firstly, the raw data from the QRNG 



ST 



49| and TestUOl |33j, to 



22] does not pass the statistic 
tests due to the classical noises mixed in the raw data and the fact that the as-obtained 
quantum signals follow a Gaussian distribution instead of a uniform distribution. Secondly, 
the random numbers from a pseudo-RNG cannot pass all the tests, which exposes its un- 
derlying determinism. Finally, we repeatedly operate the Toeplitz-hashing extractor and 
Trevisan's extractor on our raw data. The outputs from both extractors successfully pass 
all the standard statistic tests, which indicates that our post-processing is effective in ex- 
tracting out uniform randomness from weakly randomness source. All the test results are 
shown in Appendix O 
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B. Autocorrelation 

An alternative approach to verify randomness is evaluating the autocorrelation. The 
autocorrelation of the raw data are shown in Fig. |5^a) (between bits) and Fig. MJo) (between 
samples). From Fig. |2fa), we can see that the autocorrelation is significant only within an 8- 
bit sample, but drop to the vicinity of below 1 x 10~ 3 . The low values of the autocorrelation 
between samples (Fig. EJ^b)) verify the assumption that the sequence of raw data is iid 
(see Section Hi Aj) . We remark that due to the finite bandwidth of a practical detector and 
statistical fluctuations, the autocorrelation is around 1 x 1CT 3 but never drop to 0. 

After post-processing by either Travisan's extractor or Toeplitz-hashing extractor, not 
only the correlation within 8 bits (from a sample digitalized by an 8-bits ADC) is eliminated, 
but also the autocorrelation beyond 8 bits drops to 1 x 1CT 5 . The autocorrelations of the 
post-processing outputs are shown in Fig. |2^c) and Fig. EJ^d), where the low residual values 
indicate the good randomness of our extracted results. More details about autocorrelation 
analysis are shown in Appendix |D] 

V. CONCLUSION REMARKS 

We modeled QRNG to evaluate the min-entropy of the quantum source, and implemented 
two extractors - Toeplitz-hashing extractor and Trevisan's extractor - to extract the true 
randomness. The random numbers obtained in the end of post-processing passed through 
all the tests of Diehard, NIST and TestUOl. 
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daisy [bits] delay [tits] 



FIG. 2: Autocorrelation evaluation results. All normalized correlation is evaluated from a 10 Mb 
record of the raw data, (a) Autocorrelation of the raw data (between bits). The average value of 
autocorrelation coefficient is 9.5 x 10 -4 . The most significant correlations are within 8 bits, due 
to the usage of 8-bit ADC. (b) Autocorrelation of the raw data (between samples). The average 
value is 4.9 x 10 -4 . (c) Autocorrelation of the outcomes from the Toeplitz-hashing extractor. The 
average value is —1.0 x 10 -5 . (d) Autocorrelation of the outcomes from the Trevisan's extractor. 
The average value is 1.6 x 10~ 5 . In theory, for a truly random 10 x 10 6 bit string, the average 
normalized correlation coefficient is with a standard deviation of 4 x 10 -4 . 

Appendix A: QRNG Setup 

The QRNG scheme based on measuring the quantum phase fluctuations of a laser is 
presented in [ill Q]. It is well known that the fundamental phase fluctuations of a laser can 



be attributed to spontaneous emissions, which are quantum mechanical in nature 



34j. By 



operating the laser at a low intensity level, the quantum phase fluctuations can be measured 



by a delayed self-heterodyning system 35] and the measurement outcome is processed to 
generate truly random numbers. 
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The schematic diagram is shown in Fig. [3j A semiconductor laser operating near its 
threshold is used as the source of quantum phase fluctuations. A Mach-Zehnder interferom- 
eter (MZI) combined with a photodetector (PD) is employed to convert phase fluctuations 
to voltage fluctuations, which are further digitized by an ADC to generate random bits. 



Laser 
Source 




MZI 




PD 







ADC 



FIG. 3: Schematic diagram of the QRNG 22|] setup. The quantum phase fluctuations, from a laser 
operating near its threshold level, are measured by a Mach-Zehnder interferometer (MZI) and a 
photodetector (PD). The PD output is converted to raw data (random bits) by an analog-to-digital 
convertor (ADC). 



The variance of the output AC voltage V(t) from the photodetector can be described by 



22 



(V(t) 2 ) = AP 2 (Q + C) + F 



(Al) 



= AQP + ACP 2 + F, 

where A is a constant determined by the gain of the photodetector, P is the laser emission 
power, Q/P is the variance of laser quantum phase fluctuations that can be treated as a 
Gaussian white noise jsc], C is the variance of laser classical phase noise j^f]], and F 
represents the background noise of the detection system. 

In Eq. (lAlj) . the term AQP quantifies the quantum signal, from which generates true 
randomness of the QRNG. The term ACP 2 + F quantifies the classical noise, which could 
potentially introduce bias into random numbers and leak information through side channels. 
In principle, the amount of extractable quantum randomness is independent of classical noise. 
In practice, however, it is a challenge to extract a small quantum signal on top of a large 
classical noise background due to the limited resolution and dynamic range of a detection 
system. To generate high-quality random numbers, we would like to maximize the quantum 
signal while keep the classical noise as low as possible. One commonly used figure of merit in 
signal processing is the signal-to- noise ratio. In this QRNG, the quantum signal to classical 
noise ratio is described by, 

7 = 177775 I 77 - ( A2 ) 



ACP 2 + F' 
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0.95 mW 



Laser optical power P (mW) 



FIG. 4: Lower bound of the min-entropy. The optimal laser power is around 0.95 mW and the 
corresponding min-entropy is 6.7 bits per 8-bit raw sample (from an 8-bit ADC). 



As presented in Ref. 22J, the parameters of AQ, AC, and F can be experimentally 
determined, as shown in Table [B Based on Eq. ( 1A2I) and Table [U the maximal signal-to- 
noise ratio is 7 = 21, which is achieved at laser emission power P = 0.95mW. 



F (mV 2 ) AQ (mV 2 /mW) AC (mV 2 /mW 2 ) 
0.36 ±0.06 16.12 ±0.49 0.40 ±0.16 

TABLE I: Experimental results (with 0.99 confidence intervals) of parameters in Eq. (|A1 



22|. 



For the QRNG setup, we lower bound the min-entropy of the quantum signal at differ- 
ent laser emission powers in Fig. HI The details of min-entropy evaluation procedures are 
described in Ref. 22J]. Fig. H] shows that the optimal laser power is around 0.95 mW and 
the corresponding min-entropy of the quantum signal is 6.7 bits per 8-bit raw sample (from 
an 8-bit ADC). The min-entropy is stable for a laser power larger than 0.9 mW . 
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Appendix B: Trevisan's extractor 
1. Procedure 



We input an rtj-bit string, raw data from a QRNG, with a min-entropy at least k 1 and a fi- 
b-it random seed (y) into an (k, e, rii, d, n/)-extractor, constructed by combining an (rii, 1/2 — 
e/4nj)-error correction code and an (m e , p)-design, and then output an nj-bit string which 
is e-close to a uniform distribution. Here, k, e, rii are given from the source and practical 
use of random numbers. We need to figure out d (as small as possible) and rif (as large as 
possible). 

1. Map the input n^-bit string to an n-bit string according to the (rii, 1/2 — e/4nj)-error 
correction code. Here, n can be assumed to be power of 2 |9[]. In practice, one can 
concatenate Reed-Solomon code and Hadamard code together (see Appendix A of 



3l|), where the codeword length is given by 

(Bl) 



n = 2 2me 



m e = ("log rii + 2 log rif — 2 log e + 4] . 
Also, rif can be upper-bounded by k for the error correction code construction. 

2. Construct an (m e , p)-design [321 ] . with 

m e = \\ogn = 0(\og(ni/e)) 

2 (B2) 

p = [k — 31og(n//e) — d — 3]/n/, 



where the second equation is from Proposition 10 and Theorem 22 in ref. [3l|], typically 
1 < p < 1.5. The design parameter p can be viewed as the ratio of min-entropy that 
can be extracted. One can simply pick up p = 1 if the output length is to be optimized 



(Lemma 17 in ref. 3j|). The extractor seed, with a length of d, is composed of blocks 
of seeds with lengths of the square of the smallest power of 2 which is greater than 
m e . Note that this block design idea is proposed by Raz et al. |3JJ. Here, we are 
interested in a weak design with p — 1, so that most of randomness can be extracted. 
According to the explicit weak design proposed by Nisan and Wigderson [4j and proved 
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in Ref. 32|, |38), the number of such blocks and hence the seed length are given by 



(B3) 



b = [log nf\ — m d + 1 
d = 2 2m "b = 0{log 2 {m/e) logm) 
m d = [log 2m e ] . 

In fact, any design with d > [logn] 2 6 and p = 1 can be applied here. 

3. The i-th bit of the n/-bit output is given by ^-th bit of the encoded n-bit string, 
where yg t is a substring of y, formed by the bits of y at the positions given by the 
elements of Si. 



2. Implementation 

The choice of block size not only determines the seed cost and security parameter of the 
random output, but also affects the complexity aspect of the performance. For demonstra- 
tion purpose, we pick up a set of parameter for the Trevisan's extractor, listed in Table [TTJ, 
which can be run sufficiently fast on a personal computer. 



Extraction efficiency 


RS GF(2 m =) 


Design GF(2 m <*) 


Input Output 


p = l 


m e = 128 


m d = 8 


m = 2 15 n f = 2 14 


Security parameter 


ECC codeword 


Blocks 


Seed 


£ = ^^-^riinj 


n = 2 2me 


b = 7 


d = >im 2 e b 



TABLE II: A parameter set for the Trevisan's extractor. 



In this case, the random seed length is larger than output length, we can concatenate 



a hashing based extractor to make the entropy loss minimum 3l|. We pick up the output 
length of rif = 1 Mb. On one hand, too large an/ will slow down the extractor much owing 
to the 0(n 2 ) complexity with respect to input length; on the other hand, too small a nj 
will result in not only high seed cost but also a degradation of security (a larger security 
parameter e). 

Careful analysis of computational complexity is essential to understanding the tractabil- 
ity or intractability of our implementation given a reasonable computational power. The 
analysis of complexity of the Combinatorial Design in Table III II demonstrates that the most 
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economical parameter in terms of rate is at n/ = 2 14 . A smaller parameter will render the 
design powerless due to associated high key cost, and a larger parameter results unwieldy 
complexity growth. 



Avy 5 ,b j 


experimental ^ 
of Gi*2 m operation 


theoretical ^ 
GF2 m operation 


Experiment ^ of GF^m 
operation per rif size 


real time 
[sec] 


10 


65280 


262144 


63.75 


41.1934 


11 


196352 


786432 


95.875 


124.8 


12 


458496 


2097152 


111.9375 


300.81 


13 


982784 


5242880 


119.96875 


685.91 


14 


203130 


12582912 


123.984375 


1603.8 


15 


4128512 


29360128 


125.99 


3960.4 


16 


8322816 


67108864 


126.9960938 


10911 



TABLE III: Real time profile of the speed of combinatoric design. Parameters are selected to 
result the highest generation rate. Number theoretical operations in GF2™ dominate the speed 
performance of the ECC, and determines the speed of real-time performance and bit rate (per 
second) . 

As in Table HV] the top generation rate of our extractor is 706.8 bits/ s; the low speed of the 
extractor is a consequence of the lack of efficient implementation of finite field operations. 
Although slow in speed, the results from Travisan's extractor do pass of statistical tests 
of diehard. This increase in performance is at the cost of decrease in speed. The severe 
restriction on speed has limited the usage of Travisan's extractor in real-time applications. 

Our implementation is done on mere PC, but a mainframe computer can crunch number- 
theoretical operations much faster than a PC. Furthermore, as a future perspective, once 
we tackle the implementation on any graphical processing unit (GPU) platforms, the archi- 
tecture of GPU will allow us to exploit the intrinsic parallelism of the extractor much more 
efficiently via multi-threading capability. 
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nt (power) 


experimental # 
GF<i™ operation 


theoretical # 
GF<i™ operation 


Experiment # of GFy™ 
operation per rif size 


real 
time [s] 


bit 

rate [/s] 


1024(10) 


15360 


16384 


15 


1.4488 


706.8 


2048(11) 


63488 


65536 


31 


5.9326 


345.21121 


4096(12) 


258048 


262144 


63 


23.5451 


173.96401 


8192(13) 


1040384 


1048576 


127 


95.72 


173.96 


16384(14) 


4177920 


4194304 


255 


380.19 


43.1 


32768(15) 


1674448 


16777216 


511 


1536.8 


21.32 



TABLE IV: Real time profile of the speed of the Error Control Code (ECC). Parameters are 
selected to result the highest generation rate. Number theoretical operations in Gi^™ dominate 
the speed performance of the ECC, and determines the speed of real-time performance and bit 
rate (per second). 



Appendix C: Statistical test results 



We employ three statistic tests, Diehard, NIST and TestUOl [33|, to evaluate the ran- 
domness of our extracted results from Toeplitz-hashing extractor and Trevisan's extractor. 
The test results are shown in Table \V\ IVII and [VIII We can see that, the outputs from two 
extractors successfully pass all the standard statistic tests. Here, given the constraint of 
computational power for the Trevisan's extractor, we skip the NIST and TestUOl tests for 
its results. 

Without post-processing, the raw data cannot pass any statistic tests, which is mainly 
due to the classical noises mixed in the raw data, and the fact that the measure quantum 
fluctuations follow Gaussian distribution instead of uniform distribution. It demonstrates 
the requirement of effective post-processing in the QRNG. 

For control purpose, we also perform the statistic tests on a pseudo-RNG generated from 
MatLab2007. It generates uniformly random numbers from to 255 (as emulation of 8-bits 
ADC output). The results are shown in Table \V\ IVII and [VIII It cannot pass all tests. 
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Pseudo-RNG 


Raw data 


Trevisan's 


Toeplitz-hashing 


Statistical test 


Result 


Result 


p- value 


result 


p- value 


result 


Birthday Spacings [KS] 


success 


failure 


0.82263 


success 


0.340863 


success 


Overlapping permutations 


success 


failure 


0.679927 


success 


0.403824 


success 


Ranks of 31x31 matrices 


success 


failure 


0.419095 


success 


0.349441 


success 


Ranks of 31x32 matrices 


success 


failure 


0.715705 


success 


0.816752 


success 


Ranks of 6x8 matrices [KS] 


success 


failure 


0.195485 


success 


0.408573 


success 


Bit stream test 


success 


failure 


0.048260 


success 


0.281680 


success 


Monkey test OPSO 


success 


failure 


0.027300 


success 


0.892600 


success 


Monkey test OQSO 


success 


failure 


0.023200 


success 


0.267200 


success 


Monkey test DNA 


failure 


failure 


0.038000 


success 


0.736700 


success 


Count l's in stream of bytes 


success 


failure 


0.380162 


success 


0.639691 


success 


Count l's in specific bytes 


failure 


failure 


0.020417 


success 


0.373149 


success 


Parking lot test [KS] 


failure 


failure 


0.629013 


success 


0.151689 


success 


Minimum distance test [KS] 


success 


failure 


0.019499 


success 


0.688780 


success 


Random spheres test [KS] 


success 


failure 


0.488703 


success 


0.939227 


success 


Squeeze test 


success 


failure 


0.238004 


success 


0.155403 


success 


Overlapping sums test [KS] 


success 


failure 


0.022339 


success 


0.909675 


success 


Runs test (up) [KS] 


failure 


failure 


0.403504 


success 


0.181024 


success 


Runs test (down) [KS] 


success 


failure 


0.119132 


success 


0.668512 


success 


Craps test No. of wins 


success 


failure 


0.757521 


success 


0.826358 


success 


Craps test throws / game 


success 


failure 


0.179705 


success 


0.862986 


success 



TABLE V: Diehard. Data size is 240 Mbits. For the cases of multiple P-values, a Kolmogorov- 
smirnov (KS) test is used to obtain a final P-value, which measures the uniformity of the multiple 
P-values. The test is successful if all final P-values satisfy 0.01 < P < 0.99. 



Appendix D: Autocorrelation 

Statistical tests do verify the quality of randomness, but each individual test only tests 
one aspect of randomness (i.e. bias, repetition, etc). Another approach to verify randomness 
is to evaluate the autocorrelation, and check the absence or periodic correlation. 
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Pseudo-RNG 


Raw data 


Toeplitz-hashing 


Statistical test 


Result 


Result 


p- value 


Proportion Result 


Frequency 


success 


failure 


0.373625 


0.9900 


success 


Block- frequency 


success 


failure 


0.310049 


0.9960 


success 


Cumulative sums 


success 


failure 


0.422638 


0.9980 


success 


Runs 


success 


failure 


0.703417 


0.9900 


success 


LongestRun 


success 


failure 


0.013569 


0.9880 


success 


Rank 


success 


failure 


0.411840 


0.9940 


success 


FFT 


success 


failure 


0.987079 


0.9860 


success 


NonOverlappingTemplate 


failure 


failure 


0.727851 


0.9820 


success 


overlappingTemplate 


success 


failure 


0.110083 


0.9780 


success 


Universal 


success 


failure 


0.962688 


0.9880 


success 


ApproximateEntropy 


success 


failure 


0.674543 


0.9920 


success 


Random-excursions 


success 


failure 


0.409207 


0.9900 


success 


Random-excursions Variant 


success 


failure 


0.426358 


0.9840 


success 


Serial 


success 


failure 


0.217570 


0.9860 


success 


Linear-complexity 


success 


failure 


0.657833 


0.9940 


success 



TABLE VI: NIST. Data size is 3.25 Gbits (500 sequences with each sequence around 6.5 Mbits). To 
pass the test, P-value should be larger than the lowest significant level a = 0.01, and the proportion 
of sequences satisfying P > a should be greater than 0.976. Where the test has multiple P-values, 
the worst case is selected. 

The autocorrelation results of the raw data are shown in Fig. [5j The raw data from 
the QRNG is digitalized by an 8-bit ADC, therefore, the autocorrelation between bits, 
as shown in Fig. E^a), is only significant up to the 7th bit delay and, beyond that, the 
autocorrelation is negligible. The low values of autocorrelation between samples, as shown 
in Fig. Mb), support the assumption of iid raw sequence, where a slightly large coefficient 
at the 2nd delay sample can be attributed to the finite bandwidth of the photo detector. 
We remark that the correlation among samples cannot reach zero for a practical detector 
with finite bandwidth. Eve might explore this correlation and gain partial information on 
the generated random numbers. In principle, we can removed Eve's information by using 
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Pseudo-RNG 


Raw data 


Toeplitz-hashing 


Statistical Test 


Result 


Result 


P- 


value 


Result 


BirthdaySpacings 


Success 


failure 





5300 


success 


Collision 


Success 


failure 





1500 


success 


Gap Chi-square 


success 


failure 





8900 


success 


SimpPoker Chi-square 


success 


failure 





3500 


success 


CouponCollector Chi-square 


success 


failure 





6700 


success 


MaxOft Chi-square 


success 


failure 





6900 


success 


MaxOft Anderson-Darling 


success 


failure 





9500 


success 


WeightDistrib Chi-square 


success 


failure 





5600 


success 


MatrixRank Chi-square 


success 


failure 





5100 


success 


Hammingindep Chi-square 


success 


failure 





1000 


success 


Random Walkl H Chi-square 


success 


failure 





9931 


success 


Random Walkl M Chi-square 


success 


failure 





8300 


success 


Random Walkl J Chi-square 


success 


failure 





9400 


success 


Random Walkl R Chi-square 


success 


failure 





7000 


success 


Random Walkl C Chi-square 


success 


failure 





6600 


success 



TABLE VII: TestUOl (Small Crush). Given the constraint of the data size and computational 
power of Crush and Big Crush of TestUOl, we only perform Small Crush test here. Data size is 8 
Gbits. The P-value of falling a test converges to or 1 (eps or 1-eps). Where the test has multiple 
P-values, the worst case is selected. 

the same randomness extractor developed in this paper. 

After post-processing, the autocorrelation of the outputs from both extractors is sub- 
stantially improved, as shown in Fig. [6] In theory, for an infinite iid sequence as random 
process, the autocorrelation is a broadband white curve. In practice, on the other hand, due 
to the inevitable presence of bias and finite data s ize, the autocorrelation of data sequence 
can never reach 0. A back-of-envelope calculation [39j shows the effect of truncation on the 
autocorrelation coefficient. From central limit theorem, one standard deviation will result a 
range of autocorrelation, [^=, -t], where n is the data size. 
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FIG. 5: All normalized correlation is evaluated from a 10 Mbits record of the raw data, (a) The 
average value is 9.5 x 10 -4 and the most significant correlations are within 8 bits (from one sample 
digitalized by an 8-bit ADC), (b) The average value is 4.9 x 10 -4 and the correlation among 
samples cannot reach zero for a practical detector with finite bandwidth, (c) The average value is 
—9.2 x 10 -5 . (d) The average value is 1.2 x 10 , which demonstrates the absence of long period 
autocorrelation. 
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We remark that our post-processing method can be simply adapt to the QRNG developed in 
jl3T | with minor modifications 

Roughly speaking, the min-entropy can be regarded as the lower bound of randomness one 
can extract, whereas Shannon entropy can be treated as the upper bound. The min-entropy 
is always no greater than the corresponding Shannon entropy. 

In principle, one can go beyond this limitation by increasing the laser power. However, the 
upper bound of min-entropy can only increase logarithmically with power intensity. 
Toeplitz-hashing can be implemented much faster with hardware implementation 
http://www.stat.fsu.edu/pub/diehard/ 
http: / / csrc.nist.gov/groups /ST/ toolkit / rng/ 
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